Main
Anderson Banihirwe
I contribute to and maintain several libraries within the open source scientific Python stack, particularly around improving scalability of Python tools in order to handle petabyte-scale datasets on HPC and cloud platforms.
Education
B.S., Computer Systems Engineering
University of Arkansas at Little Rock
Little Rock, AR
2018 - 2014
Professional Experience
Software Engineer
National Center for Atmospheric Research
Boulder, CO
current - 2018-10
- Project lead & core maintainer of intake-esm, a Python package for loading cataloged Earth System Model data on an HPC system or in the cloud.
- Contributing to existing open source software libraries namely dask, xarray.
- Designing and building data analysis tools leveraging existing open source software for scientific computing in Python.
Software Developer Intern
Quansight
Austin, TX
2018-09 - 2018-05
- Developed xndframes, a Pandas ExtensionDtype/Array backed by xnd, a library for refactoring of NumPy capabilities to low-level libraries and high level interfaces.
- Worked on integrating cuDF - GPU DataFrame Library with Apache Arrow Library.
- Worked closely with a customer to port existing Postgres code base to Dask based workflow.
Data Science Intern
First Orion
Little Rock, AR
2018-04 - 2017-11
- Built scoring, predictive models with Scikit-learn, Dask, and Apache Spark using First Orion’s proprietary telecommunication data.
Research Intern
National Center for Atmospheric Research
Boulder, CO
2017-08 - 2017-05
- Developed spark-xarray, a Python package that integrates PySpark and xarray for Climate Data Analysis.
Selected Publications, Posters, and Talks
The Pangeo Ecosystem: Interactive Computing Tools for the Geosciences: Benchmarking on HPC
2019 Supercomputing Conference Workshop on Interactive High-Performance Computing
N/A
2020
- Authored with Tina Erica Odaka, Guillaume Eynard-Bontemps, Aurelien Ponte, Guillaume Maze, Kevin Paul, Jared Baker, Ryan Abernathey.
Intake / Pangeo Catalog: Making It Easier To Consume Earth’s Climate and Weather Data
2020 EarthCube Annual Meeting
N/A
2020
- Contributed Jupyter notebook about Pangeo’s data cataloging efforts.
Interactive Supercomputing with Dask and Jupyter
2019 Scientific Computing with Python conference
Austin, TX
2019
- Contributed talk about Dask and Jupyter.
- Recorded talk
- Slides
Beyond Matplotlib - Tutorial: Building Interactive Climate Data Visualizations with Bokeh and Friends
2018 UCAR Software Engineering Assembly conference
Boulder, CO
2018
- Contributed tutorial about interactive visualization with Python.
- Tutorial materials
PySpark for “Big” Atmospheric Data Analysis
Eighth Symposium on Advances in Modeling and Analysis Using Python
Austin, TX
2018
- Contributed talk about spark-xarray.
- Recorded talk
- Slides